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   A Framework for Defining Empirical Bulk Transfer Capacity Metrics

Status of this Memo

   This memo provides information for the Internet community.  It does
   not specify an Internet standard of any kind.  Distribution of this
   memo is unlimited.

Copyright Notice

   Copyright (C) The Internet Society (2001).  All Rights Reserved.

Abstract

   This document defines a framework for standardizing multiple BTC
   (Bulk Transport Capacity) metrics that parallel the permitted
   transport diversity.

1   Introduction

   Bulk Transport Capacity (BTC) is a measure of a network's ability to
   transfer significant quantities of data with a single congestion-
   aware transport connection (e.g., TCP).  The intuitive definition of
   BTC is the expected long term average data rate (bits per second) of
   a single ideal TCP implementation over the path in question.
   However, there are many congestion control algorithms (and hence
   transport implementations) permitted by IETF standards.  This
   diversity in transport algorithms creates a difficulty for
   standardizing BTC metrics because the allowed diversity is sufficient
   to lead to situations where different implementations will yield
   non-comparable measures -- and potentially fail the formal tests for
   being a metric.

   Two approaches are used.  First, each BTC metric must be much more
   tightly specified than the typical IETF protocol.  Second, each BTC
   methodology is expected to collect some ancillary metrics which are
   potentially useful to support analytical models of BTC.

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].  Although
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   [RFC2119] was written with protocols in mind, the key words are used
   in this document for similar reasons.  They are used to ensure that
   each BTC methodology defined contains specific pieces of information.

   Bulk Transport Capacity (BTC) is a measure of a network's ability to
   transfer significant quantities of data with a single congestion-
   aware transport connection (e.g., TCP).  For many applications the
   BTC of the underlying network dominates the overall elapsed time for
   the application to run and thus dominates the performance as
   perceived by a user.  Examples of such applications include FTP, and
   the world wide web when delivering large images or documents.  The
   intuitive definition of BTC is the expected long term average data
   rate (bits per second) of a single ideal TCP implementation over the
   path in question.  The specific definition of the bulk transfer
   capacity that MUST be reported by a BTC tool is:

      BTC = data_sent / elapsed_time

   where "data_sent" represents the unique "data" bits transfered (i.e.,
   not including header bits or emulated header bits).  Also note that
   the amount of data sent should only include the unique number of bits
   transmitted (i.e., if a particular packet is retransmitted the data
   it contains should be counted only once).

   Central to the notion of bulk transport capacity is the idea that all
   transport protocols should have similar responses to congestion in
   the Internet.  Indeed the only form of equity significantly deployed
   in the Internet today is that the vast majority of all traffic is
   carried by TCP implementations sharing common congestion control
   algorithms largely due to a shared developmental heritage.

   [RFC2581] specifies the standard congestion control algorithms used
   by TCP implementations.  Even though this document is a (proposed)
   standard, it permits considerable latitude in implementation.  This
   latitude is by design, to encourage ongoing evolution in congestion
   control algorithms.

   This legal diversity in congestion control algorithms creates a
   difficulty for standardizing BTC metrics because the allowed
   diversity is sufficient to lead to situations where different
   implementations will yield non-comparable measures -- and potentially
   fail the formal tests for being a metric.

   There is also evidence that most TCP implementations exhibit non-
   linear performance over some portion of their operating region.  It
   is possible to construct simple simulation examples where incremental
   improvements to a path (such as raising the link data rate) results
   in lower overall TCP throughput (or BTC) [Mat98].
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   We believe that such non-linearity reflects weakness in our current
   understanding of congestion control and is present to some extent in
   all TCP implementations and BTC metrics.  Note that such non-
   linearity (in either TCP or a BTC metric) is potentially problematic
   in the market because investment in capacity might actually reduce
   the perceived quality of the network.  Ongoing research in congestion
   dynamics has some hope of mitigating or modeling the these non-
   linearities.

   Related areas, including integrated services [RFC1633,RFC2216],
   differentiated services [RFC2475] and Internet traffic analysis
   [MSMO97,PFTK98,Pax97b,LM97] are all currently receiving significant
   attention from the research community.  It is likely that we will see
   new experimental congestion control algorithms in the near future.
   In addition, Explicit Congestion Notification (ECN) [RFC2481] is
   being tested for Internet deployment.  We do not yet know how any of
   these developments might affect BTC metrics, and thus the BTC
   framework and metrics may need to be revisited in the future.

   This document defines a framework for standardizing multiple BTC
   metrics that parallel the permitted transport diversity.  Two
   approaches are used.  First, each BTC metric must be much more
   tightly specified than the typical IETF transport protocol.  Second,
   each BTC methodology is expected to collect some ancillary metrics
   which are potentially useful to support analytical models of BTC.  If
   a BTC methodology does not collect these ancillary metrics, it should
   collect enough information such that these metrics can be derived
   (for instance a segment trace file).

   As an example, the models in [PFTK98, MSMO97, OKM96a, Lak94] all
   predict bulk transfer performance based on path properties such as
   loss rate and round trip time.  A BTC methodology that also provides
   ancillary measures of these properties is stronger because agreement
   with the analytical models can be used to corroborate the direct BTC
   measurement results.

   More importantly the ancillary metrics are expected to be useful for
   resolving disparity between different BTC methodologies.  For
   example, a path that predominantly experiences clustered packet
   losses is likely to exhibit vastly different measures from BTC
   metrics that mimic Tahoe, Reno, NewReno, and SACK TCP algorithms
   [FF96].  The differences in the BTC metrics over such a path might be
   diagnosed by an ancillary measure of loss clustering.
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   There are some path properties which are best measured as ancillary
   metrics to a transport protocol.  Examples of such properties include
   bottleneck queue limits or the tendency to reorder packets.  These
   are difficult or impossible to measure at low rates and unsafe to
   measure at rates higher than the bulk transport capacity of the path.

   It is expected that at some point in the future there will exist an
   A-frame [RFC2330] which will unify all simple path metrics (e.g.,
   segment loss rates, round trip time) and BTC ancillary metrics (e.g.,
   queue size and packet reordering) with different versions of BTC
   metrics (e.g., that parallel Reno or SACK TCP).

2   Congestion Control Algorithms

   Nearly all TCP implementations in use today utilize the congestion
   control algorithms published in [Jac88] and further refined in
   [RFC2581].  In addition to using the basic notion of using an ACK
   clock, TCP (and therefore BTC) implements five standard congestion
   control algorithms: Congestion Avoidance, Retransmission timeouts,
   Slow-start, Fast Retransmit and Fast Recovery.  All BTC
   implementations MUST implement slow start and congestion avoidance,
   as specified in [RFC2581] (with extra details also specified, as
   outlined below).  All BTC methodologies SHOULD implement fast
   retransmit and fast recovery as outlined in [RFC2581].  Finally, all
   BTC methodologies MUST implement a retransmission timeout.

   The algorithms specified in [RFC2581] give implementers some choices
   in the details of the implementation.  The following is a list of
   details about the congestion control algorithms that are either
   underspecified in [RFC2581] or very important to define when
   constructing a BTC methodology.  These details MUST be specifically
   defined in each BTC methodology.

      *  [RFC2581] does not standardize a specific algorithm for
         increasing cwnd during congestion avoidance.  Several candidate
         algorithms are given in [RFC2581].  The algorithm used in a
         particular BTC methodology MUST be defined.

      *  [RFC2581] does not specify which cwnd increase algorithm (slow
         start or congestion avoidance) should be used when cwnd equals
         ssthresh.  This MUST be specified for each BTC methodology.

      *  [RFC2581] allows TCPs to use advanced loss recovery mechanism
         such as NewReno [RFC2582,FF96,Hoe96] and SACK-based algorithms
         [FF96,MM96a,MM96b].  If used in a BTC implementation, such an
         algorithm MUST be fully defined.
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      *  The actual segment size, or method of choosing a segment size
         (e.g., path MTU discovery [RFC1191]) and the number of header
         bytes assumed to be prepended to each segment MUST be
         specified.  In addition, if the segment size is artificially
         limited to less than the path MTU this MUST be indicated.

      *  TCP includes a retransmission timeout (RTO) to trigger
         retransmissions of segments that have not been acknowledged
         within an appropriate amount of time and have not been
         retransmitted via some more advanced loss recovery algorithm.
         A BTC implementation MUST include a retransmission timer.
         Calculating the RTO is subject to a number of details that MUST
         be defined for each BTC metric.  In addition, a BTC metric MUST
         define when the clock is set and the granularity of the clock.

         [RFC2988] specifies the behavior of the retransmission timer.
         However, there are several details left to the implementer
         which MUST be specified for each BTC metric defined.

   Note that as new congestion control algorithms are placed on the
   standards track they may be incorporated into BTC metrics (e.g., the
   Limited Transmit algorithm [ABF00]).  However, any implementation
   decisions provided by the relevant RFCs SHOULD be fully specified in
   the particular BTC metric.

3   Ancillary Metrics

   The following ancillary metrics can provide additional information
   about the network and the behavior of the implemented congestion
   control algorithms in response to the behavior of the network path.
   It is RECOMMENDED that these metrics be built into each BTC
   methodology.  Alternatively, it is RECOMMENDED that the BTC
   implementation provide enough information such that the ancillary
   metrics can be derived via post-processing (e.g., by providing a
   segment trace of the connection).

3.1 Congestion Avoidance Capacity

   The "Congestion Avoidance Capacity" (CAC) metric is the data rate
   (bits per second) of a fully specified implementation of the
   Congestion Avoidance algorithm, subject to the restriction that the
   Retransmission Timeout and Slow-Start algorithms are not invoked.
   The CAC metric is defined to have no meaning across Retransmission
   Timeouts or Slow-Start periods (except the single segment Slow-Start
   that is permitted to follow recovery, as discussed in section 2).
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   In principle a CAC metric would be an ideal BTC metric, as it
   captures what should be TCP's steady state behavior.  But, there is a
   rather substantial difficulty with using it as such.  The Self-
   Clocking of the Congestion Avoidance algorithm can be very fragile,
   depending on the specific details of the Fast Retransmit, Fast
   Recovery or advanced recovery algorithms chosen.  It has been found
   that timeouts and periods of slow start loss recovery are prevalent
   in traffic on the Internet [LK98,BPS+97] and therefore these should
   be captured by the BTC metric.

   When TCP loses Self-Clock it is re-established through a
   retransmission timeout and Slow-Start.  These algorithms nearly
   always require more time than Congestion Avoidance would have taken.
   It is easily observed that unless the network loses an entire window
   of data (which would clearly require a retransmit timeout) TCP likely
   missed some opportunity to safely transmit data.  That is, if TCP
   experiences a timeout after losing a partial window of data, it must
   have received at least one ACK that was generated after some of the
   partial data was delivered, but did not trigger the transmission of
   new data.  Recent research in congestion control (e.g., FACK [MM96a],
   NewReno [FF96,RFC2582], rate-halving [MSML99]) can be characterized
   as making TCP's Self-Clock more tenacious, while preserving fairness
   under adverse conditions.  This work is motivated by how poorly
   current TCP implementations perform under some conditions, often due
   to repeated clock loss.  Since this is an active research area,
   different TCP implementations have rather considerable differences in
   their ability to preserve Self-Clock.

3.2 Preservation of Self-Clock

   Losing the ACK clock can have a large effect on the overall BTC, and
   the clock is itself fragile in ways that are dependent on the loss
   recovery algorithm.  Therefore, the transition between timer driven
   and Self-Clocked operation SHOULD be instrumented.

3.2.1 Lost Transmission Opportunities

   If the last event before a timeout was the receipt of an ACK that did
   not trigger a transmission, the possibility exists that an alternate
   congestion control algorithm would have successfully preserved the
   Self-Clock.  A BTC SHOULD instrument key items in the BTC state (such
   as the congestion window) in the hopes that this may lead to further
   improvements in congestion control algorithms.
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   Note that in the absence of knowledge about the future, it is not
   possible to design an algorithm that never misses transmission
   opportunities.  However, there are ever more subtle ways to gauge
   network state, and to estimate if a given ACK is likely to be the
   last.

3.2.2 Loosing an Entire Window

   If an entire window of data (or ACKs) is lost, there will be no
   returning ACKs to clock out additional data.  This condition can be
   detected if the last event before a timeout was a data transmission
   triggered by an ACK.  The loss of an entire window of data/ACKs
   forces recovery to be via a Retransmission Timeout and Slow-Start.

   Losing an entire window of data implies an outage with a duration at
   least as long as a round trip time.  Such an outage can not be
   diagnosed with low rate metrics and is unsafe to diagnose at higher
   rates than the BTC.  Therefore all BTC metrics SHOULD instrument and
   report losses of an entire window of data.

   Note that there are some conditions, such as when operating with a
   very small window, in which there is a significant probability that
   an entire window can be lost through individual random losses (again
   highlighting the importance of instrumenting cwnd).

3.2.3 Heroic Clock Preservation

   All algorithms that permit a given BTC to sustain Self-Clock when
   other algorithms might not, SHOULD be instrumented.  Furthermore, the
   details of the algorithms used MUST be fully documented (as discussed
   in section 2).

   BTC metrics that can sustain Self-Clock in the presence of multiple
   losses within one round trip SHOULD instrument the loss distribution,
   such that the performance of alternate congestion control algorithms
   may be estimated (e.g., Reno style).

3.2.4  False Timeouts

   All false timeouts, (where the retransmission timer expires before
   the ACK for some previously transmitted data arrives) SHOULD be
   instrumented when possible.  Note that depending upon how the BTC
   metric implements sequence numbers, this may be difficult to detect.
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3.3 Ancillary Metrics Relating to Flow Based Path Properties

   All BTC metrics provide unique vantage points for observing certain
   path properties relating to closely spaced packets.  As in the case
   of RTT duration outages, these can be impossible to diagnose at low
   rates (less than 1 packet per RTT) and inappropriate to test at rates
   above the BTC of the network path.

   All BTC metrics SHOULD instrument packet reordering.  The frequency
   and distance out-of-sequence SHOULD be instrumented for all out-of-
   order packets.  The severity of the reordering can be classified as
   one of three different cases, each of which SHOULD be reported.

      Segments that are only slightly out-of-order should not trigger
      the fast retransmit algorithm, but they may affect the window
      calculation.  BTC metrics SHOULD document how slightly out-of-
      order segments affect the congestion window calculation.

      If segments are sufficiently out-of-order, the Fast Retransmit
      algorithm will be invoked in advance of the delayed packet's late
      arrival.  These events SHOULD be instrumented.  Even though the
      the late arriving packet will complete recovery, the the window
      will still be reduced by half.

      Under some rare conditions segments have been observed that are
      far out of order - sometimes many seconds late [Pax97b].  These
      SHOULD always be instrumented.

   BTC implementations SHOULD instrument the maximum cwnd observed
   during congestion avoidance and slow start.  A TCP running over the
   same path as the BTC metric must have sufficient sender buffer space
   and receiver window (and window shift [RFC1323]) to cover this cwnd
   in order to expect the same performance.

   There are several other path properties that one might measure within
   a BTC metric.  For example, with an embedded one-way delay metric it
   may be possible to measure how queuing delay and and (RED) drop
   probabilities are correlated to window size.  These are open research
   questions.

3.4 Ancillary Metrics as Calibration Checks

   Unlike low rate metrics, BTC SHOULD include explicit checks that the
   test platform is not the bottleneck.

   Any detected dropped packets within the sending host MUST be
   reported.  Unless the sending interface is the path bottleneck, any
   dropped packets probably indicates a measurement failure.
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   The maximum queue lengths within the sending host SHOULD be
   instrumented.  Any significant queue may indicate that the sending
   host has insufficient burst data rate, and is smoothing the data
   being transmitted into the network.

3.5 Ancillary Metrics Relating to the Need for Advanced TCP Features

   If TCP would require advanced TCP extensions to match BTC performance
   (such as RFC 1323 or RFC 2018 features), it SHOULD be reported.

3.6 Validate Reverse Path Load

   To the extent possible, the BTC metric SHOULD distinguish between the
   properties of the forward and reverse paths.

   BTC methodologies which rely on non-cooperating receivers may only be
   able to measure round trip path properties and may not be able to
   independently differentiate between the properties of the forward and
   reverse paths.  In this case the load on the reverse path contributed
   by the BTC metric SHOULD be instrumented (or computed) to permit
   other means of gauge the proportion of the round trip path properties
   attributed to the the forward and reverse paths.

   To the extent possible, BTC methodologies that rely on cooperating
   receivers SHOULD support separate ancillary metrics for the forward
   and reverse paths.

4   Security Considerations

   Conducting Internet measurements raises security concerns.  This memo
   does not specify a particular implementation of a metric, so it does
   not directly affect the security of the Internet nor of applications
   which run on the Internet.  However, metrics produced within this
   framework, and in particular implementations of the metrics may
   create security issues.

4.1 Denial of Service Attacks

   Bulk Transport Capacity metrics, as defined in this document,
   naturally attempt to fill a bottleneck link.  The BTC metrics based
   on this specification will be as "network friendly" as current well-
   tuned TCP connections.  However, since the "connection" may not be
   using TCP packets, a BTC test may appear to network operators as a
   denial of service attack.
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   Administrators of the source host of a test, the destination host of
   a test, and the intervening network(s) may wish to establish
   bilateral or multi-lateral agreements regarding the timing, size, and
   frequency of collection of BTC metrics.

4.2 User data confidentiality

   Metrics within this framework generate packets for a sample, rather
   than taking samples based on user data.  Thus, a BTC metric does not
   threaten user data confidentiality.

4.3 Interference with metrics

   It may be possible to identify that a certain packet or stream of
   packets are part of a BTC metric.  With that knowledge at the
   destination and/or the intervening networks, it is possible to change
   the processing of the packets (e.g., increasing or decreasing delay,
   introducing or heroically preventing loss) that may distort the
   measured performance.  It may also be possible to generate additional
   packets that appear to be part of a BTC metric.  These additional
   packets are likely to perturb the results of the sample measurement.

   To discourage the kind of interference mentioned above, packet
   interference checks, such as cryptographic hash, may be used.

5   IANA Considerations

   Since this metric framework does not define a specific protocol, nor
   does it define any well-known values, there are no IANA
   considerations for this document.  However, a bulk transport capacity
   metric within this framework, and in particular protocols that
   implement a metric may have IANA considerations that need to be
   addressed.
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